Incremental Clustering for Mining in a Data Warehousing Environment

نویسندگان

Martin Ester

Hans-Peter Kriegel

Jörg Sander

Michael Wimmer

Xiaowei Xu

چکیده

Data warehouses provide a great deal of opportunities for performing data mining tasks such as classification and clustering. Typically, updates are collected and applied to the data warehouse periodically in a batch mode, e.g., during the night. Then, all patterns derived from the warehouse by some data mining algorithm have to be updated as well. Due to the very large size of the databases, it is highly desirable to perform these updates incrementally. In this paper, we present the first incremental clustering algorithm. Our algorithm is based on the clustering algorithmDBSCAN which is applicable to any database containing data from a metric space, e.g., to a spatial database or to a WWW-log database. Due to the density-based nature of DBSCAN, the insertion or deletion of an object affects the current clustering only in the neighborhood of this object. Thus, efficient algorithms can be given for incremental insertions and deletions to an existing clustering. Based on the formal definition of clusters, it can be proven that the incremental algorithm yields the same result as DBSCAN. A performance evaluation of IncrementalDBSCAN on a spatial database as well as on a WWW-log database is presented, demonstrating the efficiency of theproposed algorithm. IncrementalDBSCAN yields significant speed-up factors over DBSCAN even for large numbers of daily updates in a data warehouse.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Incremental Generalization for Mining in a Data Warehousing Environment

On a data warehouse, either manual analyses supported by appropriate visualization tools or (semi-) automatic data mining may be performed, e.g. clustering, classification and summarization. Attribute-oriented generalization is a common method for the task of summarization. Typically, in a data warehouse update operations are collected and applied to the data warehouse periodically. Then, all d...

متن کامل

A Density Based Dynamic Data Clustering Algorithm based on Incremental Dataset

Problem statement: Clustering and visualizing high-dimensional dynamic data is a challenging problem. Most of the existing clustering algorithms are based on the static statistical relationship among data. Dynamic clustering is a mechanism to adopt and discover clusters in real time environments. There are many applications such as incremental data mining in data warehousing applications, senso...

متن کامل

Rough Set Theory and Fuzzy Logic Based Warehousing of Heterogeneous Clinical Databases

Large amounts of data about the patients with their medical conditions are presented in the Medical databases. Analyzing all these databases is one of the difficult tasks in the medical environment. In order to warehouse all these databases and to analyze the patient‟s condition, we need an efficient data mining technique. In this paper, an efficient data mining technique for warehousing clinic...

متن کامل

An Incremental DC Algorithm for the Minimum Sum-of-Squares Clustering

Here, an algorithm is presented for solving the minimum sum-of-squares clustering problems using their difference of convex representations. The proposed algorithm is based on an incremental approach and applies the well known DC algorithm at each iteration. The proposed algorithm is tested and compared with other clustering algorithms using large real world data sets.

متن کامل

Clustering of Time Series Data

Time series data is of interest to most science and engineering disciplines and analysis techniques have been developed for hundreds of years. There have, however, in recent years been new developments in data mining techniques, such as frequent pattern mining, which take a different perspective of data. Traditional techniques were not meant for such pattern-oriented approaches. There is, as a ...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 1998

Incremental Clustering for Mining in a Data Warehousing Environment

نویسندگان

چکیده

منابع مشابه

Incremental Generalization for Mining in a Data Warehousing Environment

A Density Based Dynamic Data Clustering Algorithm based on Incremental Dataset

Rough Set Theory and Fuzzy Logic Based Warehousing of Heterogeneous Clinical Databases

An Incremental DC Algorithm for the Minimum Sum-of-Squares Clustering

Clustering of Time Series Data

عنوان ژورنال:

اشتراک گذاری